How Different Are Language Models andWord Clouds?
نویسندگان
چکیده
Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.
منابع مشابه
The Effect of Using Word Clouds on EFL Students’ Long- Term Vocabulary Retention
Vocabulary is an important component in all four skills of language. Issue of vocabulary retention has great importance to EFL teachers in instructional contexts because they always ...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملتاثیر نوع ابرهای پایین جو بر میزان دقت شبیه سازی رواناب در مدل SWAT
Introduction: Patterns of spatial and temporal rainfall impact on runoff and outlet hydrograph (Cordery, 1993; James, 1994). Results of different studies have clarified that simulation by using diverse rainfall data could increase the reliance of results. These were much more sensible in which areas encounter with data scarcity (Mello et al., 2008; Bekiaris et al., 2008). Rainfall properties in...
متن کاملMARY TTS unit selection and HMM-based voices
This paper describes the implementation of a unit selection English voice and a HMM-based Hindi voice for our participation in the Blizzard Challenge 2013. The two voices have been created using the MARY TTS voice building framework. We describe how audiobook data is used to create the English voice and how a quality controlmeasure (statisticalmodel cost) is used to control the selection of uni...
متن کاملTarget detection Bridge Modelling using Point Cloud Segmentation Obtained from Photogrameric UAV
In recent years, great efforts have been made to generate 3D models of urban structures in photogrammetry and remote sensing. 3D reconstruction of the bridge, as one of the most important urban structures in transportation systems, has been neglected because of its geometric and structural complexity. Due to the UAV technology development in spatial data acquisition, in this study, the point cl...
متن کامل